An Approach to the Automatic Acquisition of Phonotactic Constraints

نویسنده

  • Anja Belz
چکیده

This paper describes a formal approach and a practical learning method for automatically acquiring phonotactic constraints encoded as-nite automata. It is proposed that the use of diierent classes of syllables with class-speciic intra-syllabic phonotactics results in a more accurate hypothesis of a language's phonological grammar than the single syllable class traditionally used. Intra-syllabic constraints are encoded as acyclic nite automata with input alphabets of phonemic symbols. These automata in turn form the transitions in cyclic nite automata that encode the inter-syllabic constraints of word-level phonology. A genetic algorithm is used to automatically construct nite automata from training sets of symbol strings. Results are reported for a set of German syllables and a set of Russian bisyllabic feminine nouns. 1 Background 1.1 Phonotactic Description In recent years, phonology | partly under the innuence of computational models | has moved away from procedural, rule-based approaches towards explicitly declarative statements of the constraints that hold on possible phonological forms. Such statements form sets of constraints that apply at a given level of description, and ill-formedness is often deened as constraint violation. Phonotactic descriptions state the constraints that hold for possible sequences of phonetic or phonemic features or symbols, usually at the level of the syllable (more rarely for onset, peak and coda separately). Phonological words are deened as sequences of at least one syllable. A phonotactic description is typically thought to be adequate only if it generalises beyond the set of phonological forms that exist in a language to a superset of possible forms that also includes forms that could exist but do not. This distinction between non-existent but possible forms on the one hand, and non-existent and impossible forms on the other, is often described in terms of accidental vs. systematic gaps (e.g. lists ve encoding schemes for phonotactic description at the syllable level found in the literature: templates that merely state the number of consonants permitted in the onset and coda, distribution matrices with a separate matrix for each type of consonant cluster, enhanced templates which add the notion of phoneme classes, feature-based phono-tactic networks using feature bundles, natural classes, variables, defaults and underspeci-cation, and phrase-structure rules which have the same (potential) representative power as feature-based phonotactic networks. In the approach presented here, nite state automata (FSA) encoding is preferred over these other schemes, since they can all equivalently be represented by FSAs | what is described is always a regular …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Language Independent Approach To Acquiring Phonotactic Resources for Speech Recognition

Building and developing linguistic resources for languages is of prime importance with many areas of application. This paper focusses on a fully automatic approach to the aquisition of a syllable phonotactics for a particular language. In this approach the phonotactic constraints for a language are encoded in a finite-state phonotactic automaton the structure of which can be automatically deriv...

متن کامل

Generalisation in the Automatic Acquisition of Phonotactic Resources

Once acquired, linguistic resources for languages can be used to develop speech applications for the languages under consideration. This paper presents a fully automatic approach to the acquisition of phonotactic resources from syllable labelled data sets. While the technique requires no user intervention, the quality of acquired resources is heavily dependent on the nature and content of the s...

متن کامل

Phonetic knowledge, phonotactics an automatic language id

This study explores a multilingual phonotactic approach to automatic language identification using Broadcast News data. The definition of a multilingual phoneset is discussed and an upper limit on the performance of the phonotactic approach is estimated by eliminating any degradation due to recognition errors. This upper bound is compared to automatic language identification based on a phonotac...

متن کامل

The emerging lexicon of children with phonological delays: phonotactic constraints and probability in acquisition.

The effects of phonotactic constraints (i.e., the status of a sound as correctly or incorrectly articulated) and phonotactic probability (i.e., the likelihood of a sound sequence) on lexical acquisition have been investigated independently. This study investigated the interactive influence of phonotactic constraints and phonotactic probability on lexical acquisition in 3 groups of children: chi...

متن کامل

Effect of nonlinear pedagogy on the performance of the short backhand serve of badminton ‌

Motor learning or the acquisition of coordination is a process of searching for stable functional coordination patterns, into which a system can settle during a task or activity. Human as complex creatures can choose the best pattern based on conditions within different coordination patterns and also achieve goals of tasks. So the purpose of this study is to determination the effect of a Nonlin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998